409 research outputs found

    Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains

    Get PDF
    BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. RESULTS: A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. CONCLUSION: This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request

    Conserved host-pathogen interactions identify novel treatment options in betacoronavirus infections

    Get PDF
    Hypoxia is an underlying pathophysiological condition of a variety of devastating diseases, including acute ischemic stroke (AIS). We are faced with limited therapeutic options for AIS patients, and even after successful restoration of cerebral blood flow, the poststroke mortality is still high. More basic research is needed to explain mortality after reperfusion and to develop adjunct neuroprotective therapies. Drosophila melanogaster (D.m.) is a suitable model to analyze hypoxia; however, little is known about the impacts of hypoxia and especially of the subsequent reperfusion injury on the behavior and survival of D.m. To address this knowledge gap, we subjected two wild-type D.m. strains (Canton-S and Oregon-R) to severe hypoxia (<0.3%

    An ultrasensitive sorting mechanism for EGF receptor endocytosis

    Get PDF
    Background The EGF receptor has been shown to internalize via clathrin-independent endocytosis (CIE) in a ligand concentration dependent manner. From a modeling point of view, this resembles an ultrasensitive response, which is the ability of signaling networks to suppress a response for low input values and to increase to a pre-defined level for inputs exceeding a certain threshold. Several mechanisms to generate this behaviour have been described theoretically, the underlying assumptions of which, however, have not been experimentally demonstrated for the EGF receptor internalization network. Results Here, we present a mathematical model of receptor sorting into alternative pathways that explains the EGF-concentration dependent response of CIE. The described mechanism involves a saturation effect of the dominant clathrin-dependent endocytosis pathway and implies distinct steady-states into which the system is forced for low vs high EGF stimulations. The model is minimal since no experimentally unjustified reactions or parameter assumptions are imposed. We demonstrate the robustness of the sorting effect for large parameter variations and give an analytic derivation for alternative steady-states that are reached. Further, we describe extensibility of the model to more than two pathways which might play a role in contexts other than receptor internalization. Conclusions Our main result is that a scenario where different endocytosis routes consume the same form of receptor corroborates the observation of a clear-cut, stimulus dependent sorting. This is especially important since a receptor modification discriminating between the pathways has not been found. The model is not restricted to EGF receptor internalization and might account for ultrasensitivity in other cellular contexts

    Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    Get PDF
    BACKGROUND: The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. RESULTS: In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85%) were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. CONCLUSION: Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and microarray technologies. Predictive models generated by this approach are better validated than those generated on a single data set, while showing high predictive power and improved generalization performance

    "gtrellis": an R/Bioconductor package for making genome-level Trellis graphics

    Get PDF
    BACKGROUND: Trellis graphics are a visualization method that splits data by one or more categorical variables and displays subsets of the data in a grid of panels. Trellis graphics are broadly used in genomic data analysis to compare statistics over different categories in parallel and reveal multivariate relationships. However, current software packages to produce Trellis graphics have not been designed with genomic data in mind and lack some functionality that is required for effective visualization of genomic data. RESULTS: Here we introduce the gtrellis package which provides an efficient and extensible way to visualize genomic data in a Trellis layout. gtrellis provides highly flexible Trellis layouts which allow efficient arrangement of genomic categories on the plot. It supports multiple-track visualization, which makes it straightforward to visualize several properties of genomic data in parallel to explain complex relationships. In addition, gtrellis provides an extensible framework that allows adding user-defined graphics. CONCLUSIONS: The gtrellis package provides an easy and effective way to visualize genomic data and reveal high dimensional relationships on a genome-wide scale. gtrellis can be flexibly extended and thus can also serve as a base package for highly specific purposes. gtrellis makes it easy to produce novel visualizations, which can lead to the discovery of previously unrecognized patterns in genomic data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1051-4) contains supplementary material, which is available to authorized users

    Identifying essential genes in bacterial metabolic networks with machine learning methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective.</p> <p>Results</p> <p>We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for <it>Salmonella typhimurium</it>. We compared our predictions to the viability of experimental knock-outs of <it>S. typhimurium </it>and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway.</p> <p>Conclusions</p> <p>Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism.</p

    EnrichedHeatmap: an R/Bioconductor package for comprehensive visualization of genomic signal associations

    Get PDF
    Background: High-throughput sequencing data are dramatically increasing in volume. Thus, there is urgent need for efficient tools to perform fast and integrative analysis of multiple data types. Enriched heatmap is a specific form of heatmap that visualizes how genomic signals are enriched over specific target regions. It is commonly used and efficient at revealing enrichment patterns especially for high dimensional genomic and epigenomic datasets. Results: We present a new R package named EnrichedHeatmap that efficiently visualizes genomic signal enrichment. It provides advanced solutions for normalizing genomic signals within target regions as well as offering highly customizable visualizations. The major advantage of EnrichedHeatmap is the ability to conveniently generate parallel heatmaps as well as complex annotations, which makes it easy to integrate and visualize comprehensive overviews of the patterns and associations within and between complex datasets. Conclusions: EnrichedHeatmap facilitates comprehensive understanding of high dimensional genomic and epigenomic data. The power of EnrichedHeatmap is demonstrated by visualization of the complex associations between DNA methylation, gene expression and various histone modifications

    Using gene expression data and network topology to detect substantial pathways, clusters and switches during oxygen deprivation of Escherichia coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biochemical investigations over the last decades have elucidated an increasingly complete image of the cellular metabolism. To derive a systems view for the regulation of the metabolism when cells adapt to environmental changes, whole genome gene expression profiles can be analysed. Moreover, utilising a network topology based on gene relationships may facilitate interpreting this vast amount of information, and extracting significant patterns within the networks.</p> <p>Results</p> <p>Interpreting expression levels as pixels with grey value intensities and network topology as relationships between pixels, allows for an image-like representation of cellular metabolism. While the topology of a regular image is a lattice grid, biological networks demonstrate scale-free architecture and thus advanced image processing methods such as wavelet transforms cannot directly be applied. In the study reported here, one-dimensional enzyme-enzyme pairs were tracked to reveal sub-graphs of a biological interaction network which showed significant adaptations to a changing environment. As a case study, the response of the hetero-fermentative bacterium <it>E. coli </it>to oxygen deprivation was investigated. With our novel method, we detected, as expected, an up-regulation in the pathways of hexose nutrients up-take and metabolism and formate fermentation. Furthermore, our approach revealed a down-regulation in iron processing as well as the up-regulation of the histidine biosynthesis pathway. The latter may reflect an adaptive response of <it>E. coli </it>against an increasingly acidic environment due to the excretion of acidic products during anaerobic growth in a batch culture.</p> <p>Conclusion</p> <p>Based on microarray expression profiling data of prokaryotic cells exposed to fundamental treatment changes, our novel technique proved to extract system changes for a rather broad spectrum of the biochemical network.</p

    CGH-Profiler: Data mining based on genomic aberration profiles

    Get PDF
    BACKGROUND: CGH-Profiler is a program that supports the analysis of genomic aberrations measured by Comparative Genomic Hybridisation (CGH). Comparative genomic hybridisation (CGH) is a well-established, molecular cytogenetic method that allows the detection of chromosomal imbalances in entire genomes. This technique is widely used in routine molecular diagnostics. Typically, chromosomal imbalances are described in a complex syntax based on the International Standard for Cytogenetic Nomenclature (ISCN). This semantic description of chromosomal imbalances hinders a large-scale statistical analysis across different experiments, e.g. for finding aberration patterns associated with a particular disease type or state. RESULTS: CGH-Profiler circumvents the semantic ISCN description by importing data from different CGH system vendors and by directly transferring the data into a table format that is readily accessible for subsequent statistical analysis. CGH-profiler comes with different consistency checks, calculates various statistics and automatically assigns a median copy number ratio to each chromosomal band. Import of CGH profiles from different CGH system vendors is already supported; its extension to other systems can be readily achieved through Perl scripts. CGH profiler can also be used to analyse comparative expressed sequence hybridisation (CESH) data. CESH reveals gene expression patterns according to chromosomal locations in a similar manner as CGH detects chromosomal imbalances. CONCLUSION: CGH-Profiler is a useful tool for processing of CGH and CESH data

    New Insights into the Genetic Regulation of Plasmodium Falciparum Obtained by Bayesian Modeling

    Get PDF
    The most fatal and prevalent form of malaria is caused by the bloodborne pathogen Plasmodium falciparum (henceforth P.f). Annually, approximately three million people died of malaria. Despite P.f devastivating effect globally, the vast majority of its proteins have not been characterized experimentally. In this work, we provide computational insight that explore the modalities of the regulation for some important group of genes of P.f, namely components of the glycolytic pathway, and those involved in apicoplast metabolism. Glycolysis is a crucial pathway in the maintenance of the parasite while the recently discovered apicoplast contains a range of metabolic pathways and housekeeping processes that differ radically to those of the host, which makes it ideal for drug therapy
    corecore